Fast String Dictionary Lookup with One Error

نویسندگان

  • Timothy M. Chan
  • Moshe Lewenstein
چکیده

A set of strings, called a string dictionary, is a basic string data structure. The most primitive query, where one seeks the existence of a pattern in the dictionary, is called a lookup query. Approximate lookup queries, i.e., to lookup the existence of a pattern with a bounded number of errors, is a fundamental string problem. Several data structures have been proposed to do so efficiently. Almost all solutions consider a single error, as will this result. Lately, Belazzougui and Venturini (CPM 2013) raised the question whether one can construct efficient indexes that support lookup queries with one error in optimal query time, that is, O(|p|/ω + occ), where p is the query, ω the machine word-size, and occ the number of occurrences. Specifically, for the problem of one mismatch and constant alphabet size, we obtain optimal query time. For a dictionary of d strings our proposed index uses O(ωd log d) additional bit space (beyond the space required to access the dictionary data, which can be maintained in compressed form). Our results are parameterized for a space-time tradeoff. We propose more results for the case of lookup queries with one insertion/deletion on dictionaries over a constant sized alphabet. These results are especially effective for large patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved bounds for dictionary look-up with one error

Given a dictionary S of n binary strings each of length m, we consider the problem of designing a data structure for S that supports d-queries; given a binary query string q of length m, a d-query reports if there exists a string in S within Hamming distance d of q. We construct a data structure for the case d = 1, that requires space O(n log m) and has query time O(1) in a cell probe model wit...

متن کامل

State-of-the-Art in Weighted Finite-State Spell-Checking

The following claims can bemade about finite-statemethods for spell-checking: 1) Finite-state language models provide support for morphologically complex languages that word lists, affix stripping and similar approaches do not provide; 2) Weighted finite-state models have expressive power equal to other, state-of-the-art string algorithms used by contemporary spell-checkers; and 3) Finite-state...

متن کامل

A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction

We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformationmodels that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dict...

متن کامل

Levenshtein Distance Technique in Dictionary Lookup Methods: An Improved Approach

Dictionary lookup methods are popular in dealing with ambiguous letters which were not recognized by Optical Character Readers. However, a robust dictionary lookup method can be complex as apriori probability calculation or a large dictionary size increases the overhead and the cost of searching. In this context, Levenshtein distance is a simple metric which can be an effective string approxima...

متن کامل

High-error approximate dictionary search using estimate hash comparisons

A method for finding all matches in a pre-processed dictionary for a query string q and with at most k differences is presented. A very fast constant-time estimate using hashes is presented. A tree structure is used to minimise the number of estimates made. Practical tests are performed, showing that the estimate can filter out 99% of the full comparisons for 40% error rates and dictionaries of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015